Search results for "Training set"
showing 10 items of 68 documents
Extreme minimal learning machine: Ridge regression with distance-based basis
2019
The extreme learning machine (ELM) and the minimal learning machine (MLM) are nonlinear and scalable machine learning techniques with a randomly generated basis. Both techniques start with a step in which a matrix of weights for the linear combination of the basis is recovered. In the MLM, the feature mapping in this step corresponds to distance calculations between the training data and a set of reference points, whereas in the ELM, a transformation using a radial or sigmoidal activation function is commonly used. Computation of the model output, for prediction or classification purposes, is straightforward with the ELM after the first step. In the original MLM, one needs to solve an addit…
Eawag-Soil in enviPath: a new resource for exploring regulatory pesticide soil biodegradation pathways and half-life data.
2017
Developing models for the prediction of microbial biotransformation pathways and half-lives of trace organic contaminants in different environments requires as training data easily accessible and sufficiently large collections of respective biotransformation data that are annotated with metadata on study conditions. Here, we present the Eawag-Soil package, a public database that has been developed to contain all freely accessible regulatory data on pesticide degradation in laboratory soil simulation studies for pesticides registered in the EU (282 degradation pathways, 1535 reactions, 1619 compounds and 4716 biotransformation half-life values with corresponding metadata on study conditions)…
Dry selection and wet evaluation for the rational discovery of new anthelmintics
2017
Helminths infections remain a major problem in medical and public health. In this report, atom-based 2D bilinear indices, a TOMOCOMD-CARDD (QuBiLs-MAS module) molecular descriptor family and linear discriminant analysis (LDA) were used to find models that differentiate among anthelmintic and non-anthelmintic compounds. Two classification models obtained by using non-stochastic and stochastic 2D bilinear indices, classified correctly 86.64% and 84.66%, respectively, in the training set. Equation 1(2) correctly classified 141(135) out of 165 [85.45%(81.82%)] compounds in external validation set. Another LDA models were performed in order to get the most likely mechanism of action of anthelmin…
Window of implantation transcriptomic stratification reveals different endometrial subsignatures associated with live birth and biochemical pregnancy
2017
Objective To refine the endometrial window of implantation (WOI) transcriptomic signature by defining new subsignatures associated to live birth and biochemical pregnancy. Design Retrospective cohort study. Setting University-affiliated in vitro fertilization clinic and reproductive genetics laboratory. Patient(s) Healthy fertile oocyte donors (n = 79) and patients with infertility diagnosed by Endometrial Receptivity Analysis (n = 771). Intervention(s) None. Main Outcome Measure(s) WOI transcriptomic signatures associated with specific reproductive outcomes. Result(s) The retrospective cohort study was designed to perform a prediction model based on transcriptomic clusters for endometrial …
Modeling Chronic Toxicity: A Comparison of Experimental Variability With (Q)SAR/Read-Across Predictions
2018
This study compares the accuracy of (Q)SAR/read-across predictions with the experimental variability of chronic lowest-observed-adverse-effect levels (LOAELs) from in vivo experiments. We could demonstrate that predictions of the lazy structure-activity relationships (lazar) algorithm within the applicability domain of the training data have the same variability as the experimental training data. Predictions with a lower similarity threshold (i.e., a larger distance from the applicability domain) are also significantly better than random guessing, but the errors to be expected are higher and a manual inspection of prediction results is highly recommended.
A Simple Method to Predict Blood-Brain Barrier Permeability of Drug- Like Compounds Using Classification Trees
2017
Background: To know the ability of a compound to penetrate the blood-brain barrier (BBB) is a challenging task; despite the numerous efforts realized to predict/measure BBB passage, they still have several drawbacks. Methods: The prediction of the permeability through the BBB is carried out using classification trees. A large data set of 497 compounds (recently published) is selected to develop the tree model. Results: The best model shows an accuracy higher than 87.6% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 86.1% and 87.9%, correspondingly. We give a brief explanation, in structural terms, o…
Biomarker discovery study of inflammatory proteins for colorectal cancer early detection demonstrated importance of screening setting validation
2018
Abstract Objectives Most studies identifying inflammatory markers for early detection of colorectal cancer (CRC) were conducted using clinically manifest cases. We aimed to identify circulating inflammatory biomarkers for early detection of CRC and validate them in both a clinical setting and a true screening setting. Study Design and Setting A total of 92 inflammatory proteins were quantified in baseline plasma samples from individuals clinically diagnosed with CRC and neoplasm-free controls matched on age and sex (training set). A multimarker panel was selected and evaluated in samples from another clinical setting (validation set C) and a screening setting (validation set S). Results In …
Radiomic Machine Learning Classifiers in Spine Bone Tumors: A Multi-Software, Multi-Scanner Study
2021
Purpose: Spinal lesion differential diagnosis remains challenging even in MRI. Radiomics and machine learning (ML) have proven useful even in absence of a standardized data mining pipeline. We aimed to assess ML diagnostic performance in spinal lesion differential diagnosis, employing radiomic data extracted by different software. Methods: Patients undergoing MRI for a vertebral lesion were retrospectively analyzed (n = 146, 67 males, 79 females; mean age 63 ± 16 years, range 8-89 years) and constituted the train (n = 100) and internal test cohorts (n = 46). Part of the latter had additional prior exams which constituted a multi-scanner, external test cohort (n = 35). Lesions were la…
Screening for gastric cancer using exhaled breath samples.
2019
Abstract Background The aim was to derive a breath-based classifier for gastric cancer using a nanomaterial-based sensor array, and to validate it in a large screening population. Methods A new training algorithm for the diagnosis of gastric cancer was derived from previous breath samples from patients with gastric cancer and healthy controls in a clinical setting, and validated in a blinded manner in a screening population. Results The training algorithm was derived using breath samples from 99 patients with gastric cancer and 342 healthy controls, and validated in a population of 726 people. The calculated training set algorithm had 82 per cent sensitivity, 78 per cent specificity and 79 …
Comparative study to predict toxic modes of action of phenols from molecular structures.
2013
Quantitative structure-activity relationship models for the prediction of mode of toxic action (MOA) of 221 phenols to the ciliated protozoan Tetrahymena pyriformis using atom-based quadratic indices are reported. The phenols represent a variety of MOAs including polar narcotics, weak acid respiratory uncouplers, pro-electrophiles and soft electrophiles. Linear discriminant analysis (LDA), and four machine learning techniques (ML), namely k-nearest neighbours (k-NN), support vector machine (SVM), classification trees (CTs) and artificial neural networks (ANNs), have been used to develop several models with higher accuracies and predictive capabilities for distinguishing between four MOAs. M…